Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation (2404.02378v1)

Published 3 Apr 2024 in math.OC and cs.LG

Abstract: We prove new convergence rates for a generalized version of stochastic Nesterov acceleration under interpolation conditions. Unlike previous analyses, our approach accelerates any stochastic gradient method which makes sufficient progress in expectation. The proof, which proceeds using the estimating sequences framework, applies to both convex and strongly convex functions and is easily specialized to accelerated SGD under the strong growth condition. In this special case, our analysis reduces the dependence on the strong growth constant from $\rho$ to $\sqrt{\rho}$ as compared to prior work. This improvement is comparable to a square-root of the condition number in the worst case and address criticism that guarantees for stochastic acceleration could be worse than those for SGD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Zhang, H., Yin, W.: Gradient methods for convex minimization: Better rates under weaker conditions. arXiv preprint arXiv:1303.4645 (2013) Belkin et al. [2019a] Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Belkin et al. [2019b] Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Belkin et al. [2019b] Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  2. Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences 116(32), 15849–15854 (2019) Belkin et al. [2019b] Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  3. Belkin, M., Rakhlin, A., Tsybakov, A.B.: Does data interpolation contradict statistical optimality? In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1611–1619. PMLR (2019) Schapire et al. [1997] Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  4. Schapire, R.E., Freund, Y., Barlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 322–330. Morgan Kaufmann (1997) Liu et al. [2022] Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  5. Liu, C., Zhu, L., Belkin, M.: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis 59, 85–116 (2022) Oymak and Soltanolkotabi [2019] Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  6. Oymak, S., Soltanolkotabi, M.: Overparameterized nonlinear learning: Gradient descent takes the shortest path? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4951–4960. PMLR (2019) Belkin [2021] Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  7. Belkin, M.: Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numer. 30, 203–248 (2021) Arora et al. [2018] Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  8. Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: Implicit acceleration by overparameterization. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 244–253. PMLR (2018) Ma et al. [2018] Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  9. Ma, S., Bassily, R., Belkin, M.: The power of interpolation: Understanding the effectiveness of SGD in modern over-parametrized learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 3331–3340. PMLR (2018) Zou and Gu [2019] Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  10. Zou, D., Gu, Q.: An improved analysis of training over-parameterized deep neural networks. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 2053–2062 (2019) Polyak [1987] Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  11. Polyak, B.T.: Introduction to optimization (1987) Bassily et al. [2018] Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  12. Bassily, R., Belkin, M., Ma, S.: On exponential convergence of SGD in non-convex over-parametrized learning. arXiv preprint arXiv:1811.02564 (2018) Vaswani et al. [2019] Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  13. Vaswani, S., Mishkin, A., Laradji, I.H., Schmidt, M., Gidel, G., Lacoste-Julien, S.: Painless stochastic gradient: Interpolation, line-search, and convergence rates. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 3727–3740 (2019) Defazio and Bottou [2019] Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  14. Defazio, A., Bottou, L.: On the ineffectiveness of variance reduced optimization for deep learning. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: NeurIPS 2019, pp. 1753–1763 (2019) Loizou et al. [2020] Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  15. Loizou, N., Vaswani, S., Laradji, I., Lacoste-Julien, S.: Stochastic Polyak step-size for SGD: An adaptive learning rate for fast convergence. arXiv preprint arXiv:2002.10542 (2020) Berrada et al. [2020] Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  16. Berrada, L., Zisserman, A., Kumar, M.P.: Training neural networks for and by interpolation. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 799–809. PMLR (2020) D’Orazio et al. [2021] D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  17. D’Orazio, R., Loizou, N., Laradji, I.H., Mitliagkas, I.: Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize. CoRR abs/2110.15412 (2021) Asi and Duchi [2019] Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  18. Asi, H., Duchi, J.C.: Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization 29(3), 2257–2290 (2019) Arjevani et al. [2019] Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  19. Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv preprint arXiv:1912.02365 (2019) Nemirovsky and Nesterov [1985] Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  20. Nemirovsky, A.S., Nesterov, Y.E.: Optimal methods of smooth convex minimization. USSR Computational Mathematics and Mathematical Physics 25(2), 21–30 (1985) Vaswani et al. [2019] Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  21. Vaswani, S., Bach, F., Schmidt, M.W.: Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019. Proceedings of Machine Learning Research, vol. 89, pp. 1195–1204. PMLR (2019) Nesterov [1983] Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  22. Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O⁢(1/k2)𝑂1superscript𝑘2{O}(1/k^{2})italic_O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). In: Doklady an USSR, vol. 269, pp. 543–547 (1983) Liu and Belkin [2020] Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  23. Liu, C., Belkin, M.: Accelerating SGD with momentum for over-parameterized learning. In: 8th International Conference on Learning Representations, ICLR 2020. OpenReview.net (2020) Jain et al. [2018] Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  24. Jain, P., Kakade, S.M., Kidambi, R., Netrapalli, P., Sidford, A.: Accelerating stochastic gradient descent for least squares regression. In: Bubeck, S., Perchet, V., Rigollet, P. (eds.) Conference On Learning Theory, COLT 2018. Proceedings of Machine Learning Research, vol. 75, pp. 545–604. PMLR (2018) Nesterov [2004] Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  25. Nesterov, Y.E.: Introductory Lectures on Convex Optimization - A Basic Course. Applied Optimization, vol. 87. Springer (2004) Xiao et al. [2022] Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  26. Xiao, T., Balasubramanian, K., Ghadimi, S.: Improved complexities for stochastic conditional gradient methods under interpolation-like conditions. Oper. Res. Lett. 50(2), 184–189 (2022) Vaswani et al. [2020] Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  27. Vaswani, S., Kunstner, F., Laradji, I., Meng, S.Y., Schmidt, M., Lacoste-Julien, S.: Adaptive gradient methods converge faster with over-parameterization (and you can do a line-search). arXiv preprint arXiv:2006.06835 (2020) Duchi et al. [2011] Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  28. Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) Meng et al. [2020] Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  29. Meng, S.Y., Vaswani, S., Laradji, I.H., Schmidt, M., Lacoste-Julien, S.: Fast and furious convergence: Stochastic second order methods under interpolation. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020. Proceedings of Machine Learning Research, vol. 108, pp. 1375–1386. PMLR (2020) Varre et al. [2021] Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  30. Varre, A.V., Pillaud-Vivien, L., Flammarion, N.: Last iterate convergence of SGD for least-squares in the interpolation regime. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 21581–21591 (2021) Fang et al. [2021] Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  31. Fang, H., Fan, Z., Friedlander, M.P.: Fast convergence of stochastic subgradient method under interpolation. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021) Solodov [1998] Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  32. Solodov, M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comp. Opt. and Appl. 11(1), 23–35 (1998) Tseng [1998] Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  33. Tseng, P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 8(2), 506–531 (1998) Schmidt and Le Roux [2013] Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  34. Schmidt, M., Le Roux, N.: Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprint arXiv:1308.6370 (2013) Schmidt et al. [2011] Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  35. Schmidt, M., Le Roux, N., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F.C.N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24: NeurIPS 2011, pp. 1458–1466 (2011) d’Aspremont [2008] d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  36. d’Aspremont, A.: Smooth optimization with approximate gradient. SIAM J. Optim. 19(3), 1171–1183 (2008) Devolder et al. [2014] Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  37. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming 146(1-2), 37–75 (2014) Cohen et al. [2018] Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  38. Cohen, M., Diakonikolas, J., Orecchia, L.: On acceleration with noise-corrupted gradients. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018. Proceedings of Machine Learning Research, vol. 80, pp. 1018–1027. PMLR (2018) Chen et al. [2020] Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  39. Chen, Y.-L., Na, S., Kolar, M.: Convergence analysis of accelerated stochastic gradient descent under the growth condition. arXiv preprint arXiv:2006.06782 (2020) Even et al. [2021] Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  40. Even, M., Berthier, R., Bach, F.R., Flammarion, N., Gaillard, P., Hendrikx, H., Massoulié, L., Taylor, A.B.: A continuized view on nesterov acceleration for stochastic gradient descent and randomized gossip. CoRR abs/2106.07644 (2021) Valls et al. [2022] Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  41. Valls, V., Wang, S., Jiang, Y., Tassiulas, L.: Accelerated convex optimization with stochastic gradients: Generalizing the strong-growth condition. arXiv preprint arXiv:2207.11833 (2022) Assran and Rabbat [2020] Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  42. Assran, M., Rabbat, M.: On the convergence of Nesterov’s accelerated gradient method in stochastic settings. arXiv preprint arXiv:2002.12414 (2020) Nesterov [1988] Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  43. Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomika i Mateaticheskie Metody 24(3), 509–517 (1988) Armijo [1966] Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  44. Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics 16(1), 1–3 (1966) Bertsekas [1997] Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  45. Bertsekas, D.P.: Nonlinear programming. Journal of the Operational Research Society 48(3), 334–334 (1997) Honorio [2012] Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  46. Honorio, J.: Convergence rates of biased stochastic optimization for learning sparse Ising models. In: Proceedings of the 29th International Conference on Machine Learning, ICML 2012. icml.cc / Omnipress (2012) Mishkin [2020] Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  47. Mishkin, A.: Interpolation, growth conditions, and stochastic gradient descent. PhD thesis, University of British Columbia (2020) Vaswani et al. [2022] Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022) Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
  48. Vaswani, S., Dubois-Taine, B., Babanezhad, R.: Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., Sabato, S. (eds.) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 22015–22059. PMLR (2022)
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Aaron Mishkin (12 papers)
  2. Mert Pilanci (102 papers)
  3. Mark Schmidt (74 papers)

Summary

Faster Convergence Rates for Stochastic Nesterov Acceleration in Deep Learning

Convergence Under Interpolation Conditions

Recent advancements in deep learning have highlighted the efficacy of over-parameterized models that are capable of interpolating the training data perfectly. This paper, authored by Mishkin, Pilanci, and Schmidt, focuses on enhancing our understanding of stochastic acceleration techniques within this context, particularly examining Stochastic Accelerated Gradient Descent (SAGD). The authors propose an improved analysis for SAGD under the interpolation condition—a scenario where the learning models perfectly fit the training data. The paper makes a significant contribution by demonstrating that stochastic algorithms can achieve accelerated convergence rates, similar to deterministic counterparts, under this framework.

Theoretical Insights

The authors base their work on the premises of the interpolation condition, extending the reach of Nesterov's accelerated methods to stochastic gradient methods. The key innovation lies in proving accelerated convergence rates using a generalized analysis framework that applies to both convex and strongly convex functions. A novel aspect of this analysis is the reduction of dependency on the strong growth constant in deriving convergence rates, which signifies an improvement over previous works.

Numerical Results and Speculation

Although the paper primarily explores theoretical analyses, the implications of these findings strongly suggest potential practical advancements in training deep neural networks. By improving the efficiency of stochastic gradient descent methods in the over-parameterized regime, this work paves the way for faster and more computationally efficient training processes for large-scale models. Future research may expand on the practical applications and effectiveness of these theoretical improvements in stochastic acceleration.

Comparison with Previous Work

The paper highlights how existing analyses under strong growth conditions exhibit linear dependence on the strong growth constant, potentially rendering stochastic acceleration slower than straightforward SGD. The authors' approach distinguishes itself by offering a squared-root dependence on the growth constant, outperforming previous bounds and ensuring that stochastic accelerated methods offer genuine acceleration. This comparison with existing literature not only underscores the novelty of the paper's contribution but also sets a new benchmark for analyzing stochastic acceleration methods.

Acceleration with Preconditioning

An interesting extension discussed is the applicability of the proposed method to stochastic AGD with full matrix preconditioning. Preconditioning, an approach to modify the geometry of the optimization problem, can further enhance the convergence rates if the preconditioner is appropriately chosen. This insight opens myriad possibilities for further exploration, particularly in the context of optimizing deep neural network training processes.

Conclusion and Future Directions

Mishkin, Pilanci, and Schmidt have made a compelling case for the accelerated convergence of stochastic gradient methods under interpolation. This work not only refines our theoretical understanding but also holds promise for substantial practical impacts on training deep learning models. Looking ahead, several avenues for future research emerge, including the exploration of stochastic AGD under relaxed conditions and the development of adaptive methods to leverage the findings in a broader range of applications.

In summary, this paper makes a significant theoretical advancement by establishing improved convergence rates for stochastic accelerated gradient methods under interpolation, offering potential pathways to more efficient algorithmic frameworks in machine learning.